Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nats] NATS backend rewrite, perf and stability improvements #194

Merged
merged 7 commits into from
Nov 10, 2023

Conversation

bruth
Copy link
Contributor

@bruth bruth commented Jul 10, 2023

No description provided.

@bruth bruth changed the title [nats] Handle JetStream unavailability scenarios on startup [nats] NATS backend rewrite, perf and stability improvements Aug 14, 2023
@brandond brandond mentioned this pull request Aug 16, 2023
@bruth
Copy link
Contributor Author

bruth commented Aug 21, 2023

@brandond Quick question on desired Update semantics. The previous implementation of the NATS backend largely assumed a single node since it held a lock during the Update call. It did a read from the NATS KV, did some checks, and then applied a KV Put, which performs a blind write.

However, in a clustered setup, e.g. three k3s servers, each will be watching, competing for leases, etc. so interleaved writes can definitely be attempted. NATS KV has a native Update operation that takes the "expected last sequence/version" of that key to effectively do a conditional write, to prevent last-writer-wins behavior, i.e. serializability per key.

All that said, while watching the logs for which sets of keys get aggressively updated, it seems like it can be problematic, such as /registry/apiregistration.k8s.io/apiservices/v1beta1.metrics.k8s.io.

With the native etcd implementation in clustered mode, are updates serialized per key? Reads/watches are always going to be stale when a write is attempted, so it seems like if the loop is to attempt, then refetch, then retry, this would exacerbate the problem.

@brandond
Copy link
Member

brandond commented Aug 30, 2023

With the native etcd implementation in clustered mode, are updates serialized per key? Reads/watches are always going to be stale when a write is attempted, so it seems like if the loop is to attempt, then refetch, then retry, this would exacerbate the problem.

Kubernetes uses an etcd transaction to ensure that the key revision is the same as what was expected. If the revision matches, then the key is written to. If it does not match, then the current value is returned, and the client gets a conflict error along with a copy of the current object state. The client is expected to merge their intended changes into the current object state, and retry.

If NATS is not honoring this transaction behavior, then there is probably a significant potential for incorrect behavior.

You can find the client's transaction at https://github.com/kubernetes/kubernetes/blob/3cf3702d1e638548ba9fcfd957ebae43e1a573cb/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L527-L533

@bruth
Copy link
Contributor Author

bruth commented Aug 30, 2023

If NATS is not honoring this transaction behavior [...]

Thanks for the clarification. To confirm, the previous implementation of the backend was not honoring the transactional behavior, but this patch does.

@bruth
Copy link
Contributor Author

bruth commented Oct 31, 2023

There are a couple remaining items to clean up, but this is nearly complete. There are a number of changes, so I am happy to annotate the PR with comments on specific points of interest if that is preferred.

@sttts
Copy link

sttts commented Nov 1, 2023

As the resource version semantics had been missing from the old implementation, it makes me wonder whether Kube conformance tests are run on any of these backends, especially new ones. There are pretty extensive tests around watches and resource versions in there. They would likely have exposed the missing RV support.

@bruth
Copy link
Contributor Author

bruth commented Nov 1, 2023

I agree and would have appreciated those from the get-go. For the previous implementation, it was not advertised as supporting HA mode formally and it also used mutexes to lock on the various key prefixes. This serialized writes, however the versions that were returned when certain operations occurred was not very clear.

My understanding is that the CI runs a set of benchmarks, but I am not sure if this implicitly checks for conformance as well?

@dereknola
Copy link
Member

dereknola commented Nov 1, 2023

The tests do no check for conformance, they only validate basic functionality. IFAIK Conformance checks are only done by QA with a full K3s release.

@dereknola
Copy link
Member

The merging of #233 has likely caused merge conflict, sorry.

@bruth
Copy link
Contributor Author

bruth commented Nov 1, 2023

Understood @dereknola and no worries with the conflict.

FWIW, I did dig into the LimitedServer layer that calls into the Backend interface to see how that layer deals with errors and revisions and what not. Not a true conformance test, but did my best to understand the semantics.

@brandond
Copy link
Member

brandond commented Nov 2, 2023

Looks like there are conflicts - can you rebase, and we'll see if the tests pass?

@bruth
Copy link
Contributor Author

bruth commented Nov 3, 2023

@brandond I see the changes to the Watch API, we rebase and update the code today.

bruth added 3 commits November 4, 2023 06:34
The main new component is a btree cache to optimize
List and Count operations. The List operation assumes
and ordered set of results since a limit can be defined.

A set of tests have been added.

Support for embedded `dataDir` path and `credsFile` URL
options have been added.

Signed-off-by: Byron Ruth <[email protected]>
Signed-off-by: Byron Ruth <[email protected]>
@bruth
Copy link
Contributor Author

bruth commented Nov 6, 2023

Just an FYI, it's likely another NATS server release is going out tomorrow (some fixes along and time with the new Go releases). If we can delay a merge another day, I will update to this new release before getting incorporated into the next k3s release.

@brandond
Copy link
Member

brandond commented Nov 7, 2023

watching here for an update - will merge once updated.

@bruth
Copy link
Contributor Author

bruth commented Nov 8, 2023

2.10.5 is delayed to either tomorrow (Thursday) or Friday. Not sure if you want to move ahead and then I can open a PR to update to the dependency. There are some fixes relevant for the kine/k3s workload, but did not want to block your upstream release.

@brandond
Copy link
Member

brandond commented Nov 8, 2023

I'm in no hurry to get anything out, there are no Kubernetes releases in November so we're not doing a K3s release that would even ship any of this until December.

@bruth
Copy link
Contributor Author

bruth commented Nov 8, 2023

Sounds great, will update later this week.

@bruth
Copy link
Contributor Author

bruth commented Nov 10, 2023

Updated this to 2.10.5 (released yesterday), so ready to merge from my perspective.

@dereknola dereknola merged commit 827ef29 into k3s-io:master Nov 10, 2023
2 checks passed
@bruth bruth deleted the nats-js-retry branch November 11, 2023 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants